Speeding-up one-versus-all training for extreme classification via mean-separating initialization

نویسندگان

چکیده

Abstract In this paper, we show that a simple, data dependent way of setting the initial vector can be used to substantially speed up training linear one-versus-all classifiers in extreme multi-label classification (XMC). We discuss problem choosing weights from perspective three goals. want start region weight space (a) with low loss value, (b) is favourable for second-order optimization, and (c) where conjugate-gradient (CG) calculations performed quickly. For margin losses, such an initialization achieved by selecting it separates mean all positive (relevant label) instances negatives – two quantities calculated quickly highly imbalanced binary problems occurring XMC. demonstrate speedup $$5\times$$ 5 × on Amazon-670K dataset 670,000 labels. This comes part reduced number iterations need due starting closer solution, implicit negative-mining effect allows ignore easy CG step. Because convex nature optimization problem, without any degradation accuracy. The implementation found at https://github.com/xmc-aalto/dismecpp .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speeding up ResNet training

Time required for model training is an important limiting factor for faster pace of progress in the field of deep learning. The faster the model training, the more options researchers are able to try in the same amount of time, and the higher the quality of their results. In this work we stacked a set of techniques to optimize training time of the ResNet model with 20 layers and achieved a subs...

متن کامل

Speeding up the binary Gaussian process classification

Gaussian processes (GP) are attractive building blocks for many probabilistic models. Their drawbacks, however, are the rapidly increasing inference time and memory requirement alongside increasing data. The problem can be alleviated with compactly supported (CS) covariance functions, which produce sparse covariance matrices that are fast in computations and cheap to store. CS functions have pr...

متن کامل

Speeding Up Dijkstra ’ s Algorithm for All Pairs Shortest Paths ∗

We present a technique for reducing the number of edge scans performed by Dijkstra’s algorithm for computing all pairs shortest paths. The main idea is to restrict path scanning only to locally shortest paths, i.e., paths whose proper subpaths are shortest paths. On a directed graph with n vertices and m edges, the technique we discuss allows it to reduce the number of edge scans from O(mn) to ...

متن کامل

Speeding up computations via molecular biology

We show how to extend the recent result of Adleman 1] to use biological experiments to directly solve any NP problem. We, then, show how to use this method to speedup a large class of important problems.

متن کامل

Speeding up Training with Tree Kernels for Node Relation Labeling

We present a method for speeding up the calculation of tree kernels during training. The calculation of tree kernels is still heavy even with efficient dynamic programming (DP) procedures. Our method maps trees into a small feature space where the inner product, which can be calculated much faster, yields the same value as the tree kernel for most tree pairs. The training is sped up by using th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2022

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-022-06228-2